OMG U got flu? Analysis of shared health messages for bio-surveillance

نویسندگان

  • Nigel Collier
  • Nguyen Truong Son
  • Mai Nguyen Thi Ngoc
چکیده

BACKGROUND Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts ('tweets') are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. RESULTS We created guidelines for tagging self protective behaviour based on Jones and Salathé (2009)'s behaviour response survey. Applying the guidelines to a corpus of 5283 Twitter messages related to influenza like illness showed a high level of inter-annotator agreement (kappa 0.86). We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a self-reported diagnosis. In addition to classification performance we report moderately strong Spearman's Rho correlation by comparing classifier output against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season. CONCLUSIONS The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks. We believe that the signals we have modelled may be applicable to a wide range of diseases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Mining Public Health Topics from Twitter

We present the Ailment Topic Aspect Model (ATAM), a new topic model for Twitter that associates symptoms, treatments and general words with diseases (ailments). We train ATAM on a new collection of 1.6 million tweets discussing numerous health related topics. ATAM isolates more coherent ailments, such as influenza, infections, obesity, as compared to standard topic models. Furthermore, ATAM mat...

متن کامل

A Review of Influenza Surveillance System in the Islamic Republic of Iran: History, Structures and Processes

Background and Objectives: Iran, like most other countries in the world, is always threatened with global epidemics and pandemics of influenza. The purpose of this study was to review the influenza surveillance system in Iran.   Methods: Data of this study were obtained from the surveillance system of the Center for Communicable Disease Control, the review of records, documents, books and pub...

متن کامل

Discovering Multi-Scale Co-Occurrence Patterns of Asthma and Influenza with Oak Ridge Bio-Surveillance Toolkit

We describe a data-driven unsupervised machine learning approach to extract geo-temporal co-occurrence patterns of asthma and the flu from large-scale electronic healthcare reimbursement claims (eHRC) datasets. Specifically, we examine the eHRC data from 2009 to 2010 pandemic H1N1 influenza season and analyze whether different geographic regions within the United States (US) showed an increase ...

متن کامل

Twitter mining for fine-grained syndromic surveillance

BACKGROUND Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essentia...

متن کامل

Twitter Influenza Surveillance: Quantifying Seasonal Misdiagnosis Patterns and their Impact on Surveillance Estimates

BACKGROUND Influenza (flu) surveillance using Twitter data can potentially save lives and increase efficiency by providing governments and healthcare organizations with greater situational awareness. However, research is needed to determine the impact of Twitter users' misdiagnoses on surveillance estimates. OBJECTIVE This study establishes the importance of Twitter users' misdiagnoses by sho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010